Search results for "Pattern matching"

showing 10 items of 44 documents

Secure and Privacy Preserving Pattern Matching in Distributed Cloud-based Data Storage

2019

Given two strings: pattern $p$ of length $m$ and text $t$ of length $n$ . The string matching problem is to find all (or some) occurrences of the pattern $p$ in the text $t$ . We introduce a new simple data structure, called index arrays, and design fast privacy-preserving matching algorithm for string matching. The motivation behind introducing index arrays is determined by the need for pattern matching on distributed cloud-based datasets with semi-trusted cloud providers. It is intended to use encrypted index arrays both to improve performance and protect confidentiality and privacy of user data.

021110 strategic defence & security studiesTheoretical computer scienceComputer sciencebusiness.industry0211 other engineering and technologiesCloud computing02 engineering and technologyString searching algorithmData structureEncryptionSimple (abstract algebra)Computer data storagePattern matchingbusinessBlossom algorithm2019 10th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS)

researchProduct

Reverse-safe data structures for text indexing

2021

We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z-reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D. The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z, we propose an algorithm which constructs a z-reverse-safe data structure that has size O(n) and answers pattern matching queries of length at most d optim…

050101 languages & linguisticsComputer sciencedata structure02 engineering and technologyprivacySet (abstract data type)combinatoric0202 electrical engineering electronic engineering information engineering0501 psychology and cognitive sciencesPattern matchingSettore ING-INF/05 - Sistemi Di Elaborazione Delle InformazionialgorithmSettore INF/01 - Informatica05 social sciencesSearch engine indexingINF/01 - INFORMATICAdata miningData structureMatrix multiplicationcombinatoricsExponent020201 artificial intelligence & image processingdata structure; algorithm; combinatorics; de Bruijn graph; data mining; privacyAlgorithmAdversary modelde Bruijn graphInteger (computer science)

researchProduct

On Combinatorial Generation of Prefix Normal Words

2014

A prefix normal word is a binary word with the property that no substring has more 1s than the prefix of the same length. This class of words is important in the context of binary jumbled pattern matching. In this paper we present an efficient algorithm for exhaustively listing the prefix normal words with a fixed length. The algorithm is based on the fact that the language of prefix normal words is a bubble language, a class of binary languages with the property that, for any word w in the language, exchanging the first occurrence of 01 by 10 in w results in another word in the language. We prove that each prefix normal word is produced in O(n) amortized time, and conjecture, based on expe…

Amortized analysisConjecturePrefix Normal WordBinary numbercombinatorial generation; formal languages; prefix normal words; binary strings; jumbled pattern matching; bubble languages; efficient algorithmsContext (language use)prefix normal wordsData_CODINGANDINFORMATIONTHEORYformal languagesbubble languagesSubstringcombinatorial generationbinary stringsPrefixCombinatoricsjumbled pattern matchingefficient algorithmsPattern matchingAlgorithmsWord (computer architecture)Mathematics

researchProduct

Ancestral Reconstruction and Investigations of Genomic Recombination on some Pentapetalae Chloroplasts

2019

Abstract In this article, we propose a semi-automated method to rebuild genome ancestors of chloroplasts by taking into account gene duplication. Two methods have been used in order to achieve this work: a naked eye investigation using homemade scripts, whose results are considered as a basis of knowledge, and a dynamic programming based approach similar to Needleman-Wunsch. The latter fundamentally uses the Gestalt pattern matching method of sequence matcher to evaluate the occurrences probability of each gene in the last common ancestor of two given genomes. The two approaches have been applied on chloroplastic genomes from Apiales, Asterales, and Fabids orders, the latter belonging to Pe…

Ancestral reconstructionMost recent common ancestor0206 medical engineeringGenomic recombination02 engineering and technology[INFO.INFO-SE]Computer Science [cs]/Software Engineering [cs.SE]Dynamic programmingGenome[INFO.INFO-IU]Computer Science [cs]/Ubiquitous ComputingEvolution Molecular[INFO.INFO-CR]Computer Science [cs]/Cryptography and Security [cs.CR]AsteralesGene duplication0202 electrical engineering electronic engineering information engineeringPattern matchingGenome ChloroplastRosaceaeResearch ArticlesPhylogenySequence (medicine)Recombination GeneticbiologyGeneral Medicinebiology.organism_classification[INFO.INFO-MO]Computer Science [cs]/Modeling and SimulationAncestral genome reconstructionApialesEvolutionary biology[INFO.INFO-MA]Computer Science [cs]/Multiagent Systems [cs.MA]020201 artificial intelligence & image processing[INFO.INFO-ET]Computer Science [cs]/Emerging Technologies [cs.ET][INFO.INFO-DC]Computer Science [cs]/Distributed Parallel and Cluster Computing [cs.DC]Pentapetalae chloroplasts020602 bioinformaticsTP248.13-248.65BiotechnologyJournal of Integrative Bioinformatics

researchProduct

Discovering representative models in large time series databases

2004

The discovery of frequently occurring patterns in a time series could be important in several application contexts. As an example, the analysis of frequent patterns in biomedical observations could allow to perform diagnosis and/or prognosis. Moreover, the efficient discovery of frequent patterns may play an important role in several data mining tasks such as association rule discovery, clustering and classification. However, in order to identify interesting repetitions, it is necessary to allow errors in the matching patterns; in this context, it is difficult to select one pattern particularly suited to represent the set of similar ones, whereas modelling this set with a single model could…

Association rule learningDiscretizationComputer scienceContext (language use)Correlation and dependencecomputer.software_genreSet (abstract data type)CardinalityKnowledge extractionMotif extraction Pattern discoveryPattern matchingData miningCluster analysisTime complexitycomputer

researchProduct

Pattern Matching and Pattern Discovery Algorithms for Protein Topologies

2001

We describe algorithms for pattern-matching and pattern-learning in TOPS diagrams (formal descriptions of protein topologies). These problems can be reduced to checking for subgraph isomorphism and finding maximal common subgraphs in a restricted class of ordered graphs. We have developed a subgraph isomorphism algorithm for ordered graphs, which performs well on the given set of data. The maximal common subgraph problem then is solved by repeated subgraph extension and checking for isomorphisms. Despite its apparent inefficiency, this approach yields an algorithm with time complexity proportional to the number of graphs in the input set and is still practical on the given set of data. As a…

CombinatoricsDiscrete mathematicsSubgraph isomorphism problemMaximal independent setInduced subgraph isomorphism problemPattern matchingFast methodsNetwork topologyTime complexityAlgorithmMaximum common subgraph isomorphism problemMathematicsofComputing_DISCRETEMATHEMATICSMathematics

researchProduct

Linear-size suffix tries

2016

Suffix trees are highly regarded data structures for text indexing and string algorithms [MCreight 76, Weiner 73]. For any given string w of length n = | w | , a suffix tree for w takes O ( n ) nodes and links. It is often presented as a compacted version of a suffix trie for w, where the latter is the trie (or digital search tree) built on the suffixes of w. Here the compaction process replaces each maximal chain of unary nodes with a single arc. For this, the suffix tree requires that the labels of its arcs are substrings encoded as pointers to w (or equivalent information). On the contrary, the arcs of the suffix trie are labeled by single symbols but there can be Θ ( n 2 ) nodes and lin…

Compressed suffix arrayGeneral Computer ScienceSuffix tree[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]Generalized suffix tree0102 computer and information sciences02 engineering and technologyData_CODINGANDINFORMATIONTHEORYText indexing01 natural sciencesY-fast trielaw.inventionLongest common substring problemTheoretical Computer ScienceCombinatoricsSuffix treelawFactor and suffix automata0202 electrical engineering electronic engineering information engineeringData_FILESArithmeticFactor and suffix automata; Pattern matching; Suffix tree; Text indexing; Theoretical Computer Science; Computer Science (all)Pattern matchingMathematicsSettore INF/01 - InformaticaX-fast trieComputer Science (all)LCP array010201 computation theory & mathematics020201 artificial intelligence & image processingFM-index

researchProduct

Indexing a sequence for mapping reads with a single mismatch

2014

Mapping reads against a genome sequence is an interesting and useful problem in computational molecular biology and bioinformatics. In this paper, we focus on the problem of indexing a sequence for mapping reads with a single mismatch. We first focus on a simpler problem where the length of the pattern is given beforehand during the data structure construction. This version of the problem is interesting in its own right in the context of the next generation sequencing. In the sequel, we show how to solve the more general problem. In both cases, our algorithm can construct an efficient data structure in time and space and can answer subsequent queries in time. Here, n is the length of the s…

Computer sciencegenome sequenceGeneral Mathematics[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]General Physics and AstronomyContext (language use)algorithmscomputer.software_genrePattern matchingSequenceSearch engine indexingGeneral EngineeringWildcard characterArticlescomputer.file_formatConstruct (python library)Data structuremapping readspattern matchingComputingMethodologies_DOCUMENTANDTEXTPROCESSINGData mining[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]Focus (optics)mismatchcomputerAlgorithmindexingPhilosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences

researchProduct

Cognitive Mimetics for Designing Intelligent Technologies

2018

Design mimetics is an important method of creation in technology design. Here, we review design mimetics as a plausible approach to address the problem of how to design generally intelligent technology. We argue that design mimetics can be conceptually divided into three levels based on the source of imitation. Biomimetics focuses on the structural similarities between systems in nature and technical solutions for solving design problems. In robotics, the sensory-motor systems of humans and animals are a source of design solutions. At the highest level, we introduce the concept of cognitive mimetics, in which the source for imitation is human information processing. We review and discuss so…

Computer sciencemedia_common.quotation_subjectdesigningInteraction designlcsh:QA75.5-76.95050105 experimental psychology03 medical and health sciences0302 clinical medicineHuman–computer interactioncognitive mimetics0501 psychology and cognitive sciencesPattern matchingjäljittelymedia_commonDesign technologybusiness.industry05 social sciencesInformation processingRoboticsCognitionHuman-Computer Interactionintelligent technologiessuunnittelumimesiskognitiivinen jäljittelyälytekniikkalcsh:Electronic computers. Computer scienceArtificial intelligenceBiomimeticsbusinessImitation030217 neurology & neurosurgery

researchProduct

Multi-Dimensional Pattern Matching with Dimensional Wildcards: Data Structures and Optimal On-Line Search Algorithms

1997

We introduce a new multidimensional pattern matching problem that is a natural generalization of string matching, a well studied problem1. The motivation for its algorithmic study is mainly theoretical. LetA1:n1,?,1:nd be a text matrix withN=n1?ndentries andB1:m1,?,1:mr be a pattern matrix withM=m1?mrentries, whered?r?1 (the matrix entries are taken from an ordered alphabet ?). We study the problem of checking whether somer-dimensional submatrix ofAis equal toB(i.e., adecisionquery).Acan be preprocessed andBis given on-line. We define a new data structure for preprocessingAand propose CRCW-PRAM algorithms that build it inO(logN) time withN2/nmaxprocessors, wherenmax=max(n1,?,nd), such that …

Control and OptimizationSuffix treeBlock matrixWildcard characterString searching algorithmcomputer.file_formatData structurelaw.inventionCombinatoricsComputational MathematicsMatrix (mathematics)Computational Theory and MathematicsSearch algorithmlawPattern matchingcomputerMathematicsJournal of Algorithms

researchProduct